4 research outputs found

    GenomeScope 2.0 and Smudgeplot for Reference-Free Profiling of Polyploid Genomes

    Get PDF
    An important assessment prior to genome assembly and related analyses is genome profiling, where the k-mer frequencies within raw sequencing reads are analyzed to estimate major genome characteristics such as size, heterozygosity, and repetitiveness. Here we introduce GenomeScope 2.0 (https://github.com/tbenavi1/genomescope2.0), which applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. We describe and evaluate a practical implementation of the polyploid-aware mixture model that quickly and accurately infers genome properties across thousands of simulated and several real datasets spanning a broad range of complexity. We also present a method called Smudgeplot (https://github.com/KamilSJaron/smudgeplot) to visualize and estimate the ploidy and genome structure of a genome by analyzing heterozygous k-mer pairs. We successfully apply the approach to systems of known variable ploidy levels in the Meloidogyne genus and the extreme case of octoploid Fragaria × ananassa

    Genomic Features of Parthenogenetic Animals.

    No full text
    Evolution without sex is predicted to impact genomes in numerous ways. Case studies of individual parthenogenetic animals have reported peculiar genomic features that were suggested to be caused by their mode of reproduction, including high heterozygosity, a high abundance of horizontally acquired genes, a low transposable element load, or the presence of palindromes. We systematically characterized these genomic features in published genomes of 26 parthenogenetic animals representing at least 18 independent transitions to asexuality. Surprisingly, not a single feature was systematically replicated across a majority of these transitions, suggesting that previously reported patterns were lineage-specific rather than illustrating the general consequences of parthenogenesis. We found that only parthenogens of hybrid origin were characterized by high heterozygosity levels. Parthenogens that were not of hybrid origin appeared to be largely homozygous, independent of the cellular mechanism underlying parthenogenesis. Overall, despite the importance of recombination rate variation for the evolution of sexual animal genomes, the genome-wide absence of recombination does not appear to have had the dramatic effects which are expected from classical theoretical models. The reasons for this are probably a combination of lineage-specific patterns, the impact of the origin of parthenogenesis, and a survivorship bias of parthenogenetic lineages

    Optimized sample selection for cost-efficient long-read population sequencing.

    No full text
    An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples
    corecore